← Latest news 
RecursiveMAS cuts multi-agent latency by using embeddings only, delivering 2.4x speed and 75% fewer tokens
Technology
Published on 16 May 2026

Agents “talk” through embeddings, not intermediate text—fast and cheaper
Researchers from UIUC and Stanford propose RecursiveMAS, a multi-agent framework that replaces text-to-text communication with latent embedding passing. Instead of generating reasoning tokens at every step, agents loop continuous representations through RecursiveLink modules and only output text at the end. Tests across nine benchmarks show up to 2.4x faster inference, 75% token reduction by round three, and an 8.3% accuracy gain, with far cheaper training than full fine-tuning.
- Average accuracy rises 8.3% over the strongest baselines
- End-to-end inference speeds up 1.2x to 2.4x by avoiding stepwise text
- Token usage drops 75.6% by recursion round three versus Recursive-TextMAS
- Training updates only ~13M RecursiveLink parameters (about 0.31% of trainable size)
Read the full story at Venture Beat
This summarization was done by Beige for a story published on
Venture Beat
